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BACKGROUND OF THE INVENTION 

Function approximation is a term applied to a process of producing a rule 
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that can be used by a computer to figure out what to do when it meets a new set 
of circumstances. For example, function approximation may be used to evaluate 
the credit risk of credit card applicants. In this case, the function that we wish 
to approximate, called the target function, relates information about a credit 
card applicant to the most profitable credit limit for the credit card issuer. The 
exact form of this target function is not known to us, so we must approximate 
it with a hypothesis function, which a computer can use to set credit limits for 
applicants. If the hypothesis function is a good approximation of the target 
function, then the computer will produce credit limits that lead to good profits. 

For another example, function approximation may be used to aid medical 
decisions. In this case, the target function may relate medical test results and 
other patient data to the fraction of a tumor that will be destroyed if a given 
treatment is used. If we can use function approximation to develop a hypothesis 
function that closely resembles the target function, then a computer system can 
aid the decision regarding which type of treatment a doctor and patient should 
choose. 

For another example, function approximation may be used to develop a 
vehicle that drives itself. In this case, the target function may relate images 
from video cameras mounted on the vehicle to the brake pressure that should 
be applied for safe driving. A hypothesis function that closely approximates 
the target function could be implemented in a computer system onboard the 
vehicle. 



In each of these examples, it is important to develop a hypothesis function 
that closely approximates the target function. It is also important to evaluate 
how well the developed hypothesis function approximates the target function. 
This evaluation is called validation. 

Fusion is a method of function approximation in which multiple functions, 
called basis functions, are combined to develop a hypothesis function. Fusion is 
useful because it can combine a variety of development efforts, with a variety of 
strengths, to form a single hypothesis function. The invention described here is 
a process to produce a hypothesis function through fusion and to validate the 
hypothesis function. 

In function approximation, there is a target function that we do not know 
how to compute, and there is a distribution over the input space of the target 
function. For example, the input distribution could consist of images produced 
by a video camera mounted in a car, and the target function could be the brake 
pressure applied by a safe driver in response to the situation depicted in each 
image. 

We have a set of in-sample examples with inputs drawn according to the 
input distribution and outputs determined by the target function. We also 
have a set of out-of-sample inputs drawn according to the input distribution. 
In the braking example, in-sample examples could be collected by recording 
video images and the corresponding brake pressure while a human drives the 
car. Out-of-sample inputs could be collected by recording video images under a 
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variety of driving conditions. 

The primary goal of function approximation is to use the in-sample examples 
to develop a hypothesis function that closely approximates the target function 
over out-of-sample inputs. The capability of a hypothesis function to closely 
approximate the target function over inputs not used to develop the hypoth- 
esis function is called generalization. In the braking example, the hypothesis 
function could be implemented by a computer system that receives video input 
and produces an output signal that communicates the desired brake pressure 
to a brake actuator. The goal is to use the in-sample examples to develop a 
computer system that mimics a safe human driver under a variety of conditions. 

Another goal is to evaluate how well the hypothesis function generalizes, i.e., 
how well the hypothesis function approximates the target function over the out- 
of-sample inputs. The process of evaluating generalization is called validation. 
In the braking example, we wish to evaluate how well the computerized system 
mimics a safe human driver. This information allows us to either judge the 
system unsafe or deploy it with confidence. 

Fusion is one method to develop a hypothesis function for function approxi- 
mation. In fusion, the in-sample data are used to develop basis functions. Then 
a mixing function is developed. The mixing function combines the outputs of 
the basis functions into a single output to form the hypothesis function. In the 
braking example, several research groups can use different methods to develop 
different systems to control braking. These systems implement basis functions. 



Then another research group can develop a system that combines the outputs of 
the other systems into a single output. The system that combines outputs im- 
plements the mixing function. The combined systems implement the hypothesis 
function formed by fusion. 

Now we describe prior art. There are many prior methods to develop a 
hypothesis function through fusion of basis function outputs. For these methods, 
computing an error bound for the hypothesis function entails a tradeoff between 
generalization and validation, as follows. 

One prior method to validate the hypothesis function formed by fusion is to 
withhold some in-sample data from the development of the hypothesis function, 
then use the performance of the hypothesis function on the withheld data to 
compute an error bound. This method has the disadvantage that the withheld 
data are not used to develop the hypothesis function. As a result, the hypothesis 
function formed by this method is generally a worse approximation of the target 
function than a hypothesis function developed using all in-sample data. So 
generalization, which is the primary goal of function approximation, tends to 
suffer under this method. 

Another prior method to validate the hypothesis function formed by fusion 
is to use all in-sample data in all steps of developing the hypothesis function, 
then use the performance of the hypothesis function over the in-sample data to 
compute an error bound. In this case, the in-sample data are not independent 
of the hypothesis function, since they are used to develop it. So the error bound 



must be based on a statistical framework that uses uniform error bounds over the 
class of all possible hypothesis functions that might have been developed. (This 
class is independent of the in-sample data.) The class-based error bounds are 
weaker than error bounds based on a single function or a small set of functions. 
So validation, which is the secondary goal of function approximation, tends to 
suffer under this method. 

A technique called validation by inference eliminates the tradeoff between 
validation and generalization. Validation by inference allows all data to be 
used in the development of the hypothesis function while allowing computation 
of error bounds based on a small set of basis functions rather than a large 
class of functions. A prior process that uses validation by inference to obtain 
an error bound for a hypothesis function formed by fusion is detailed in the 
Provisional Application Serial No. 60/156,676, which is hereby incorporated by 
reference. The prior process computes an error bound for a given hypothesis 
function, but it does not develop a hypothesis function. Specifically, it does not 
determine a hypothesis function that minimizes the error bound over a class 
of prospective hypothesis functions. Also, the prior method entails solving a 
large mathematical program, having at least as many variables as the number 
of out-of-sample data inputs. The mathematical program uses a discretization 
technique that results in a tradeoff between program size and accuracy, so a 
very large mathematical program is required for very accurate validation. 



BRIEF SUMMARY OF THE INVENTION 



Given a set of basis functions, a set of inputs, and uniform squared error 
bounds for the basis functions over the inputs, the process determines a hypoth- 
esis function with minimum squared error bound among hypothesis functions 
formed by convex combinations of the basis function outputs. The error bound 
is based on uniform error bounds over the set of basis functions rather than 
uniform error bounds over the entire class of prospective hypothesis functions. 
The process operates as follows. 

For each basis function, the vector of outputs over the out-of-sample inputs 
is computed. This vector corresponds to a point in Euclidean space with a 
dimension for each out-of-sample example. The basis function error bound 
constrains the vector of target function outputs to a closed ball centered at the 
point. Since the basis function bounds are uniform, the target function point is 
constrained to the intersection of the basis function constraint balls. 

Given a hypothesis function, we can infer an error bound as follows. First, 
produce the vector of hypothesis function outputs over the out-of-sample in- 
puts. Then find a point in the intersection of basis function constraint balls 
with maximum distance from the hypothesis function point. The square of the 
distance is a bound on the sum of squared distances between hypothesis func- 
tion and target function outputs. Divide by the number of examples to produce 
a mean-squared error bound for the hypothesis function. 
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The center of a ball of minimum radius that includes the intersection of con- 
straint balls corresponds to the outputs of a hypothesis function with minimum 
error bound. The process identifies a hypothesis function with minimum error 
bound among those formed by convex combinations of basis function outputs. 
To do this, a quadratic program is used to minimize the radius over convex 
combinations of constraint balls. The process produces a hypothesis function 
with error bound at least as low as the minimum basis function error bound, 
since the basis functions are among the feasible solutions. 

The process applies a combination of three ideas to the problem of developing 
a hypothesis function through fusion and computing an error bound for the 
hypothesis function. The first idea is to minimize an error bound as a strategy 
to minimize error. The second idea, called transductive inference, is to use 
knowledge of out-of-sample inputs to develop a hypothesis function. The third 
idea, called validation by inference, is to use uniform error bounds over a small 
set of functions to imply uniform error bounds over an entire class of functions. 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING 

FIG. 1 is a flowchart that illustrates some steps in the process that is the 
invention; notes in boxes describe process steps, and notes beside arrows describe 
data transferred between steps. 



DETAILED DESCRIPTION OF THE INVENTION 



In the first subsection we review uniform error bounds. In the next sub- 
section, we discuss the intuition behind the error bound minimization method. 
Then we present the method. We end with a discussion of methods to obtain 
out-of-sample inputs. 

Review of Uniform Validation 

Denote the basis functions by #i,...,5a/- Denote the target function by 
/. Define the error of function g over a data set as the average of the squared 
differences between g and / over the inputs. Assume that the squared difference 
over each input is no greater than t. 

Let v be the error score of a basis function over the validation data, and 
let v l be the (unknown) error score over the out-of-sample data. Let D be 
the number of validation examples, and let N be the number of out-of-sample 
examples. Denote the negatives of the error scores on individual validation 
examples using random variables X\, . . . , Denote the error scores on out- 
of-sample examples using random variables Xq+\, . . . , Xq+n. Note that 

PrW >v + et}= Pr{ + - + + X D+1 + ..^ + X D+N > et} (l) 

We will use results from the paper "Probability Inequalities for Sums of Bounded 
Random Variables," in the American Statistical Association Journal, 1963, 



pages 13 to 30, by W. Hoeffding, which is hereby incorporated in its entirety by 
reference. Using a result by Hoeffding from the paper, page 16, Equation 2.7, 

Pr{i/ > v 4 et) < et^+N- 1 ) . (2) 

If D — iV, then we can derive a stronger and simpler result. In this case, 

Pv{v' >u + et} = Pr{ Xl + ' ^ + * 2N > et}. (3) 

So 

Pr{,' > v + et} = Pr{ Xl + - 2N +X2N > \t}. (4) 
Using another result from the paper by Hoeffding, page 16, Equation 2.6, 

Pr{v' >v + et} <e~ 2Ne \ (5) 

For the remainder of this paper, assume D = N. In practice, if D ^ TV, then 
substitute (2) for (5). 

Let u m and represent the validation and out-of-sample error scores for 
basis function Using the sum of probabilities to bound the probability of 
the union event, 

Pr{i/J > i/i + et or ... or v' M > v M 4 et) < Me~ 2Nc \ (6) 

In other words, with probability at least 1 — Me~ 2Ne2 , for all basis functions, the 
out-of-sample error score is at most et greater than the validation error score. 
Refer to the out-of-sample error bound i/ m 4 et as 6 m . 

Intuition 



10 



We begin with a small example. Assume we have two basis functions, g\ 
and Q2, and two out-of-sample inputs, xi and X2. Also, assume that we have 
obtained the following uniform error bounds for the basis functions. 

2 

Pr{Vm € {1, 2}| J> m (xi) - /(x,)] 2 < 4} > 0.90. (7) 

In other words, with at least 90% probability, the sum of squared errors is 4 
or less for both basis functions. Finally, assume that g\ (xi) = 0, £/i(x2) = 0, 
52 (xi) = 3, and s 2 (*2) =0. 

The error bound for g\ implies that (/(xi), /(X2)) is constrained to the 
closed disk of radius 2 centered at (0,0): 



The error bound for #2 implies that (/(xi), /(x 2 )) is constrained to the closed 
disk of radius 2 centered at (3,0): 



The uniform bounds imply that (/(xi), /(x 2 )) is in the intersection of these 
disks. 

For any pair of hypothesis function values (A(xi), /i(x 2 )), the sum of squared 
differences between the hypothesis function and the target function can be no 
greater than the square of the maximum distance from the hypothesis function 
point to a point in the intersection of basis function constraint disks. Dividing 
this squared distance by two gives an error bound for the hypothesis function. 



/( Xl ) 2 +/(x 2 ) 2 <4. 



(8) 



[/( Xl )-3] 2 + /(x 2 ) 2 <4. 



(9) 
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The hypothesis function outputs that minimize the error bound correspond 
to the center of the minimum radius disk that covers the intersection of the 
constraint disks. (The center of this hypothesis function disk minimizes the 
maximum distance to a point in the intersection.) 

To find the minimizing hypothesis function outputs and minimum error 
bound for our example, let y\ = /(xi) and 2/2 = /(x2)- Then the constraint 
disks can be written as follows. 



Any point that is in the intersection satisfies both constraints, so it also satisfies 
any convex combination of the constraints: 



where 9 € [0,1]. 

For each value of 0, the convex combination corresponds to a disk constraint. 
To see this, rewrite the convex combination in constraint disk form. First, 
expand the square and collect terms. 



vl + vl ~ 4 < 0 and ( Vl - 3) 2 + y\ - 4 < 0 



(10) 



(1 - e)[y\ + y 2 2 -4] +*[( yi - 3) 2 + y 2 2 -4] < 0 



(11) 



{y\ - 602,1 + 99) + y\ - 4 < 0. 



(12) 



Now complete the square. 



{y\ - 60yi + 90 2 ) + (90 - 96 2 ) + y\ - 4 < 0. 



(13) 



The constraint disk is 



(yi-30) 2 + i/l -[4 -90(1-0)] <0. 



(14) 



12 



The center of the disk is (39, 0). The square of the radius is 4 — 99(1 — 0), which 
is minimum when 9 = ~. So the hypothesis function values with minimum error 
bound are (fc(xi), h(x2)) = (§,0); the error bound is \ • |, and the minimizing 
hypothesis function is \q\ + \q2- 

In general, there are N out-of-sample inputs xi, . . . ,x# and M basis func- 
tions <7i, . . . , £Jm- In ij^, the bound for basis function g is ball with radius equal 
to the square root of the bound on the sum of squared differences between 
g and the target function /. Uniform basis function bounds imply that the 
target function values (/(xi), . . . ,/(x#)) are constrained to the intersection of 
the basis function balls. Hence, for a hypothesis function /i, the distance from 
(ft(xi), . . . , /i(xat)) to the most distant point in the intersection implies an error 
bound for the hypothesis function. The hypothesis function values that mini- 
mize this bound correspond to the center of the radius of the minimum-radius 
ball that contains the intersection. 

Algorithm 

For m € {1, . . . ,M} and n G {1, . . . , JV}, let y n - /(x n ); let c mn = s m (x n ), 
and let denote the bound on the sum of squared errors for g m . For each 
basis function # m , the constraint inequality is 

N 

- c mn ) 2 - ri < 0. (15) 

71=1 

Each point in the intersection of the basis function constraint balls satisfies the 
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following convex combination of basis function' constraint inequalities. 

M N 
m=l n—l 

where 9 > 0 and 0 2 + . . . + 0Af = 1- 

Rewrite the convex combination of constraint balls as a single constraint ball 
with center (ci, . . . , cat) and radius r: 

N 

$>n-c„) 2 -r 2 <0. (17) 

71=1 

To do this, first expand the squares in Expression 16. 

M N 

E M£(2/2 - 2c mn y n + c 2 mn ) - rl\ < 0. (18) 

m=l n=l 

Reverse the order of summation. 

N M MM 

J2 [vl - 2( £ *.»C™0l/n + E ^ C -"l " E *™ r m ^ 0 " ( 19 ) 

n=l m— 1 m=l m=l 

Let 

M 

Cn = 51 ^rnCmn- (20) 
m=l 

Then we have 

TV N M M 

E - 2c n2/n ] + E E - E Oral** < 0. (21) 

71=1 71=1 771=1 m=l 

Complete the squares. 

N N N M M 

Eb 2 - 2cny n + 4]-E c n + EE - E *"» r m ^ °- ( 22 ) 

n=l n=l n—l m=l m=l 

Collect terms and expand (? n to get 

TV N M N M M 

E(2/n " Cn) 2 - E( E ^^) 2 + E E " E ^ ^ 0 " ( 23 ) 

n—l n=l m=l n=l m=l m=l 
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By pattern-matching with Expression 17, 

M N M M 

r 2 = J2 ^ + £((£ ^ c -) 2 - E ^ C -nl- (24) 

m=l n=l m=l m—1 

Our goal is to find the convex combination 9 that minimizes r 2 . Note that 
r 2 is quadratic in 0 and that the quadratic part is positive semidefinite. So 
we may use quadratic programming. Written in standard form, the quadratic 
program is: 

minimize & A9 + b'0 (25) 
subject to 6>i + ... -f 9 M = 1 and 9 > 0, (26) 
where i is an M x M matrix with 

N 

G>ij = ^ ^ C-inC-nj > (^^) 
n=l 

and b is an M- vector with 

&i=»?-£c? B . (28) 

n=l 

Denote the solution by r 2 and the minimizing convex combination by 9. 
Then the minimum radius ball that includes the intersection of basis function 
constraint balls (and has a center that is a convex combination of constraint 
ball centers) has radius f 2 and center given by Equation 20: 

M M 
C = 9 m C mU . . . , QrnCmN). (29) 

m=l m—1 

Equivalently, the minimum inferred bound on the sum of squared errors is r 2 , 
and the minimizing hypothesis function is 

h = 9igi + ... + 9 M gM. (30) 
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To recap, the steps of the algorithm are as follows. 

1. Form the quadratic program of Equations 25, 26, 27, and 28. This is a 
quadratic program with variables corresponding to convex combination 
weights, with constraints based on basis function outputs and basis func- 
tion error bounds over the out-of-sample inputs, and with an objective 
function corresponding to an error bound for the function formed by a 
convex combination of basis function outputs. 

2. Solve the quadratic program to determine a minimizing convex combina- 
tion and a minimum of the objective function. 

3. Form the hypothesis function by using the minimizing convex combination 
as weights on the basis function outputs. 

4. Return the minimum of the objective function as a bound on the sum of 
squared errors for the hypothesis function over the out-of-sample inputs. 



Obtaining Out-of-Sample Inputs 



The invention detailed here uses out-of-sample inputs, i.e., inputs drawn ac- 
cording to the input distribution and not used to develop the basis functions. 
In some cases, the out-of-sample inputs are available directly before basis func- 
tion training begins. In other cases, a means of producing out-of-sample inputs 
according to the input distribution is available. In yet other cases, an option 
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is to collect a set of inputs to which the hypothesis function is to be applied, 
use the set as the out-of-sample inputs to develop the hypothesis function, and 
then apply the hypothesis function to those inputs. Examples of the first two 
cases follow. 

An example of some out-of-sample inputs being available directly is as fol- 
lows. Suppose there is a database containing information on patients having a 
condition. There is a new drug available for the condition. Various tests have 
been performed to determine the relationship between data about a patient and 
how much the drug will help the patient. Each test was performed on a set 
of patients chosen at random from the database, and each test yielded a basis 
function. In this case, the data for the set of patients who were not chosen for 
tests is an available set of out-of-sample inputs. 

An example of having a means of producing out-of-sample inputs according 
to the input distribution is as follows. Suppose the goal is to develop a hypoth- 
esis function that is the brake pressure applied by a safe driver in response to 
situations depicted in images from a video camera mounted on a car. Then a 
method to obtain out-of-sample inputs is to collect images while a safe driver 
drives the car in the same sorts of environments and conditions as those in which 
the hypothesis function will be used. 
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